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Abstract 

Consider a color camera mounted at a fixed point and a 
target object to be tracked. We propose a method to replace 
the common "blue screen** technique in performing seg- 
mentation of the object. The target object image in the video 
sequence is separated from the background through a seg- 
mentation process employing the color mixture model tech- 
niques. The background information is recovered through 
a similarity search with the panorama of the original back- 
ground. The foreground segment can then be encoded by 
traditional compression while the scene background is rep- 
resented as a panorar^ia. Finally, the foreground object 
combines with the corresponding panoramic segment or 
any desired image on-the-fiy to reconstruct the video frame. 
Our system can be used in low bandwidth applications as 
well as special demonstration in which the demonstrator 
can appear/disappear at any time he/she wishes. 



1 Introduction 

TraditLonally, video superimposition techniques always 
involve the use of color-keying methods. This "blue screen" 
technique requires special preparation in the studio back- 
ground, and the foreground people or objects should have 
no blue colors on the appearance. The actor cannot move 
out of ihe studio and thus limited the flexibility in filmmak- 
ing. In this work, we propose a new method to replace it in 
perfomiing segmentation with the help of color modelling 
technique and a background panorama. We describe a new 
coding scheme based on a layering concept: a foreground 
layer with moving objects on top of a background panorama 
mosaic image of the scene. The background scene mosaic 
is constructed iirst. For each frame, the foreground object is 
segmented and registered using color mixture models. The 
two layers are handled separately until reconstruction and 
the background panorama can be replaced by any desired 
image. Our approach can be considered as a ^'scenic*' in- 
stead of "color" or "chroma" keying technique. 



Panorama images have been used for video sequence 
representation [i][2][7]l8]. As successive frames usually 
overlap by a large amount, the use of panorama provides 
a significant storage size reduction to represent the scene. 
Color provides many advantages such as tolerance of par- 
tial occlusion, resolution and scale distortion etc., over other 
cues in visual perception. It has been using for segmenta- 
tion [4], tracking and recognition [3] [6]. In our approach, 
both of the background scene and the foreground objects 
are modelled using color mixtures. 




Figure 1. Source video clip l 



This paper is organized as follows. Foreground segmen- 
tation and registration are discussed in Section 2. We have 
the video frame reconstruction in Section 3. Our experiment 
results, discussions and future directions are in Section 4. 

2 Foreground Segmentation & Registration 




Figure 2. Panoranui mosaic sections 

Figure 1 shows one of our video clip. The first step is to 
construct the panorama mosaic for the background layer, as 
shown in Figure 2, using a set of 16 regular photos. After 
that, our segmentation algorithm consists of several steps: 

L Color modelling of the foreground object. 

2. Color modelling of the i^^ frame background view. 



S5. 



Q 

a 

(D 

O 



0-7695-0750-6/00 $10.00 © 2000 IEEE 



4S00CID: <XP__i 



.10533309A_I_> 



3. Segment the foreground object from the i^'* frame by 
posterior probabilities from the two mixture densities. 

4. According to the remaining pan, extract the search re- 
gion out of the (i + l)*** ilrame to search for the next 
background view from the panorama. 

5. Go back to step 2 until all video frames are processed. 
2.1 Foreground Object Color Mixture Models 




Figure 3. Source frame and foreground object 



The foreground object is modelled only once and used 
for the entire sequence before processing. Figure 3 shows 
one of the original frame and the foreground object from the 
video clip 1. lliis sequence shows the foreground object, a 
man, walking around in the background scene, the roof of a 
building. In our work, we use an effective semi-paramethc 
technique, the Gaussian mixture models, for color density 
estimation. The conditional density for a pixel x, belonging 
to a class ^, is modelled as a mixture with n component 
densities. Then our problem can be viewed as a parametric 
family of finite mixture densities: 



(1) 



where at is the mixing parameter or weighting factor that 
corresponds to the prior probability that pixel x is generated 
by component i, and = 1- ^ is the collection of the 
mixture In a 2D Hue-Saturation (HS) color space, each 
mixture component can be viewed as a Gaussian: 



piz\i) = 



2^1 E J 



(2) 



where ^, and are the mean and the covariance matrix 
of the Gaussian respectively. Now we have to estimate 
the mixture collection ^. Expectation-Maximization (EM) 
is an iterative procedure for numerically approximating 
maximum likelihood estimates for mixture density problem 
[5], The estimation process can be adaptive as well, i.e., 
<t>i is non-stationaiy and new <t>i may appear throughout the 



sampling process. When a model is once learned by EM, 
then it can be converted into a look-up table for efficient 
color probability indexing. The EM algorithm can be sum- 
marized as two iterative steps: the expectation E-step and 
the maximization M-slep. Given a current approximation of 
mixture we can obtain the next approximation ^+ ftom: 

E-step: Determine £;(log/(y|$)|i, #') 

M'Siep: Choose argniax^^ £^(log f(y\^)\x, 

2.2 Background Scene Color Mixture Models 




Figure 4. Background from panorama 



The color distribution of the background view will be 
modelled in addition to that of the foreground object loij- 
Figure 4 shows the background view. For the i^^ frame of 
the sequence, //ram«(0» we have the corresponding back- 
ground viewing window of Uie panorama, IpanoW, at pan- 
ning view angle <i>. We assume the correct background view 
Tpana(0) of the firsl frame //rome(O) is given and with a 
panning view angle ^ = 0. Now we can obtain the color 
mixture model of the first background view /pan© (0), using 
the same methods as stated in the previous section. Given 
color density estimates for both the foreground object, Iobj> 
and the background view Ipano{4^). probability that a 
pixel X belongs to the foreground object can be calculated 
by the posterior probability P{Ioitj\x): 



p{x\Uj)P(Iobj) 



p{x\lol>j)P{Ioh3) + p(x|/paao W))P( 



(3) 



Therefore, a pixel x will be classified as the class with 
the maximum P{Iobj\^) and the minimum mis-classifying 
probabiUty. That is. a pbtcl x with P(Ioh3\x) > 0.5 will be 
classified as foreground object and vice versa: 



X e 



if P{Iobj\x) >0.5 
E>(^) Otherwise 



(4) 



The foreground regions //ore(0 are thus extracted. Figure 
5 shows the extracted foreground regions in the left. Since 
the background view is changing throughout the sequence, 
we have to find some way to obtain the correct background 
view for every remaining frame. 
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23 Searching the Next Background View 




Figure 5. Segmentation results 



We assumed the camera is fixed with horizontal pan- 
ning. Tlie frame rale is fasi enough such that the back- 
ground regions of two consecutive frames are more or less 
the same. Then we can *Vut" the searching region from 
the next frame, according to the previous one. Figure 5 
shows the segmented remaining region on the right, which 
are bounded by larger bloclcs. We denote the searciiing re- 
gion as a template region TR{Iframe{i)h which is cut out 
from T frame {*) according to the remaining part of segmen- 
tation o(Iframe{i — !)• Then we have a minimization prob- 
lem of Ei in the Huc-Saturation-Intensity (HSI) color space: 

Ei{4>i) = [TR(If (i))-ri2(/pa«oWi))r (5) 

where ^{ is the new panning view angle of the panorama 
at the i^^ frame following a small update At an 

optimal Stpi^i^i, the difference between the background re- 
gion of the video frame and the panoramic view would be 
minimized. The segmented foreground IforeXi) will be reg- 
istered by the changes in the corresponding panoramic pan- 
ning view angle 6<t>i-\^i, When the correct background 
view is found, it will be modelled again and followed by 
the segmentation process. The result of the segmentation in 
turn can be used to find the next background view. These 
procedures are continued until all of the frames are pro- 
cessed. Finally, the foreground sequence will be com- 
pressed by MPEG- 1 coding. 

3 Video Stream Reconstruction 

For reconstruction, we first decode the foreground frame 
sequence and get every Iforeii) back. The background 
scene is obtained from the corresponding view of the 
panorama mosaic with cylindrical projection as /pano(^t) 
according to the panning angle The viewer simply ren- 
der the foreground object segments //ore (si) over the back- 
ground scene from panorama to reconstruct the original 
frame as shown in Figure 6. In addition, our system pro- 
vides a simple and effective solution for scene-based video 
indexing through various panning angle (/>. This approach 
can be considered as a complement to the content -based 
(color and texture) indexing method. 




Rgure 6. Reconstructed video franne 



4 Experimental Result & Discussion 

We have captured two video clips with 60 frames in each 
clip. The first one has been shown in Figure 1. Rgure 7 
shows the second one with its segmentation results. 
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Rgure 7. Video dip 2 



4.1 Video Compression Performance 

Table 1 shows the resulting storage sizes of different 
components involved in our system of video clip 1 and 
video clip 2. A partial mosaic image of the background 
scene is used in each of the experiments. For video clip 1 in 
Figure I . the compression ratio is: 



CR^ 



l,pano 



= 1- 



[Vl,pano] _ 1 _ 



313.3 
21873 



= 98.57% (6) 



Similarly for video clip 2 in Figure 7, the compression ratio 
is 98.49%. In general, CR depends on the scene complex- 
ity. Moreover, for a longer video clip, the overhead of the 
size of the mosaic image is relatively small and results in a 
better compression ratio. 

4.2 Quality of Reconstructed Video Sequence 

The measurement of image quality is based on the abso- 
lute difference of each pair of corresponding pixels, in the 
Hue>Saturation-Intensity (HSI) color space. For two pic- 
tures la and Ity we define the absolute difference between 
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Table 1« Storage sizes of video dip 1 and 2 



Items 


Sizckb (V,) 


Size kb 


Original video \Vori\ 


21873 


21873 


Panorama image [/pano] 


21.8 


32.2 


Foreground [V/^rc] 


291.6 


329.2 


F^no-cncoded [Vpano] 


313.3 


361.4 



Table 2. Means of HSf differences - clip 1 



la 


h 




Original 


Pano-encoded 


0.0668 (Vi) 


0.1231 (V2) 


MPEG-1 


Pano-encoded 


0.0598 (V,) 


0.1190 {V2) 


Original 


MPEG-1 


0.0290 (Vi) 


0.0806 (Va) 



a pair of corresponding pixels I^ix.y) and Jtl^Cyy) be 
Ehsi [la (i^i !/)» h(x, y)), which is normalised to fall within 
the range [O^ 1]. For the entire images with m x n pixels, 
we define the nomialized average difterence E//sj{Iay h) 
between them by: 

m.n 

^HSlilaJb) = y; EnSt{Iai3:,y),Ib(x,y)) (7) 

Table 2 shows the overall means of the normalized average 
differences Ensr of video clip 1 and video clip 2, among 
the uncompressed original video sequence, the MPEG-1 
compressed video sequence and the reconstmcted video se- 
quence from our sy&tem. 

43 Superimposition from New Background 




Figure 8. Applying new background 



By replacing the original panorama, we can synthesize 
various virtual environments as shown in Figure 8. Our sys- 
tem can also provide certain interesting features, like inter- 
active controls on panning view angle and zoom factor, to 
explore the whole scene or examine details of any particular 
frame. We may also allow zooming and venical panning of 
camera motion. However, these modifications will lead to 
problems in the simulation of out-focusing (depth of view) 
effects of the background, and the estimation of zooming 



factor and the vertical panning angle of the camera. Another 
promising direction is to investigate how to handle complex 
scenes with multiple dynamic foreground objects. 

5 Conclusion 

We have presented a method in performing superimpo- 
sition with the help of a background panorama. A video 
stream is decomposed by color>based segmentation, and 
represented as a combination of background panorama and 
foreground objects. During reconstruction, the foreground 
segments are combined widi their corresponding views in 
the background panorama, which can also be replaced by 
any desired image to perform superimposition. Our experi- 
ments demonstrate the effective results. 
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